Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm - fix debug info & misc #96

Merged
merged 6 commits into from
Jan 3, 2024
Merged

Conversation

Luukdegram
Copy link
Collaborator

No description provided.

When one or more symbols point to the same function (and body), we would
previously write the same function body multiple times. Now we instead
deduplicate them and point all aliased symbols to the same atom to ensure
we emit a function and its body just once.
Rather than expensively iterating to te first atom to then iterate
over all the atoms back to the end, we now simply start from the end
and allocate the last atom as the first atom onwards. This simplifies
the logic and we do not have to iterate atoms twice.
Previously we would only mark debug sections if they contained relocations
that targeted a marked symbol. However, no debug sections would get parsed
as they wouldn't be represented by exported symbols and therefore not get
marked and parsed themselves. Now, we create synthetic symbols for all
debug sections and ensure custom sections always get marked alive to
ensure we emit them correctly.
The code -and function sections must match in order within the binary.
Previously we would order the code section before writing them to disk.
However, this meant they were already allocated and the offsets were
based on the previous order. This meant that debug info was incorrect.
We now order the atoms before allocation, and ensure synthetic functions
are created after allocation, but appended correctly to the code section
to ensure they are emit last, and therefore have correct offsets.
When calculating the function offset for its relocation we would
previously use atom's offset with a fixed additional offset. However, we
must include the size of the previous function's body which is LEB128-
encoded. This means we cannot use a fixed-size offset to calculate the
function body offset within the code section unless we use a fixed-size
LEB size, which would increase the binary size. Instead, we simply re-
calculate the atom's offset based on the currently written bytes during
atom writing as we will not need this offset until we perform relocations
for the debug section anyway.
Rather than ordering the atoms of the code section earlier, we simply
skip it until we write the actual code section. This is possible because
we don't need the know the offset of each atom until we perform the
relocations of the debug sections, which we already delay to writing of
those sections. Debug sections *must* always come after the module,
including the code section. Therefore this is fine to relay on.
We now do a lot less work and make the codebase simpler as well.
This also allows us to remove the `next` field on atoms, reducing the
memory usage every so slightly.
@Luukdegram Luukdegram merged commit 3e692d9 into kubkon:main Jan 3, 2024
4 checks passed
@Luukdegram Luukdegram deleted the wasm-fixes branch January 3, 2024 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant